A place for everything, everything in its place - Benjamin Franklin
READMEs are great, but don’t document something if you could just make that thing self-documenting by definitionPick a strategy, any strategy, just pick one and stick to it!
Pick a strategy, any strategy, just pick one and stick to it!
Pick a strategy, any strategy, just pick one and stick to it!
Ready to analyze data:
Raw data:
R scripts + the Markdown files from “Compile Notebook”:
The figures created in those R scripts and linked in those Markdown files:
Linear progression of R scripts, and Makefile to run the entire analysis:
Tab-delimited files with one row per gene of parameter estimates, test statistics, etc.:
Files to help collaborators understand the model we fit: some markdown docs, a Keynote presentation, Keynote slides exported as PNGs for viewability on GitHub:
knitr & RMarkdown.GOOD ENOUGH!
from_joe directoryREADME or in comments in my R code – whatever makes it easiest for me to remind myself of a file’s provenance, if it came from the outside world in a state that was not ready for programmatic analysis.from_joe, where I don’t force myself to keep same standards with respect to file names and open formats.Here’s how most data analyses go down in reality: - You get raw data - You explore, describe and visualize it - You diagnose what this data needs to become useful - You fix, clean, marshal the data into ready-to-analyze form - You visualize it some more - You fit a model or whatever and write lots of numerical results to file - You make prettier tables and many figures based on the data & results accumulated by this point - Both the data file(s) and the code/scripts that acts on them reflect this progression
The R scripts:
01_marshal-data.r
02_pre-dea-filtering.r
03_dea-with-limma-voom.r
04_explore-dea-results.r
90_limma-model-term-name-fiasco.r
The figures left behind:
02_pre-dea-filtering-preDE-filtering.png
03-dea-with-limma-voom-voom-plot.png
04_explore-dea-results-focus-term-adjusted-p-values1.png
04_explore-dea-results-focus-term-adjusted-p-values2.png
...
90_limma-model-term-name-fiasco-first-voom.png
90_limma-model-term-name-fiasco-second-voom.png
File organization should reflect inputs vs outputs and the flow of information
/Users/jenny/research/bohlmann/White_Pine_Weevil_DE:
drwxr-xr-x 20 jenny staff 680 Apr 14 15:44 analysis
drwxr-xr-x 7 jenny staff 238 Jun 3 2014 data
drwxr-xr-x 22 jenny staff 748 Jun 23 2014 model-exposition
drwxr-xr-x 4 jenny staff 136 Jun 3 2014 results